362 research outputs found
Visual Data Mining
Occlusion is one of the major problems for interactive visual knowledge discovery and data mining in the process of finding patterns in multidimensional data.This project proposes a hybrid method that combines visual and analytical means to deal with occlusion in visual knowledge discovery called as GLC-S which uses visualization of n-D data in 2D in a set of Shifted Paired Coordinates (SPC). A set of Shifted Paired Coordinates for n-D data consists of n/2 pairs of common Cartesian coordinates that are shifted relative to each other to avoid their overlap. Each n-D point A is represented as a directed graph A* in SPC, where each node is the 2D projection of A in a respective pair of the Cartesian coordinates.
The proposed GLC-S method significantly decrease cognitive load for analysis of n-D data and simplify pattern discovery in n-D data. The GLC-S method iteratively splits n-D data into non-overlapping clusters (hyper-rectangles) around local centers and visualizes only data within these clusters at each iteration. The requirements for these clusters are to contain cases of only one class and be the largest cluster with this property in SPC visualization.
Such sequential splitting allows: (1) avoiding occlusion, (2) finding visually local classification patterns, rules, and (3) combine local sub-rules to a global rule that classifies all given data of two or more classes. The computational experiment with Wisconsin Breast Cancer data(9-D), User Knowledge Modeling data(6-D), and Letter Recognition data(17-D) from UCI Machine Learning Repository confirm this capability. At each iteration, these data have been split into training (70%) and validation (30%) data. It required 3 iterations in Wisconsin Breast Cancer data, 4 iterations in User Knowledge Modeling and 5 iterations in Letter Recognition data and respectively 3, 4, 5 local sub-rules that covered over 95% of all n-D data points with 100% accuracy at both training and validation experiments. After each iteration, the data that were used in this iteration are removed and remaining data are used in the next iteration. This removal process helps to decrease occlusion too. The GLC-S algorithm refuses to classify remaining cases that are not covered by these rules, i.e.,., do not belong to found hyper-rectangles. The interactive visualization process in SPC allows adjusting the sides of the hyper-rectangles to maximize the size of the hyper-rectangle without its overlap with the hyper-rectangles of the opposing classes.
The GLC-S method splits data using the fixed split of n coordinates to pairs. This hybrid visual and analytical approach avoids throwing all data of several classes into a visualization plot that typically ends up in a messy highly occluded picture that hides useful patterns. This approach allows revealing these hidden patterns.
The visualization process in SPC is reversible (lossless). i.e.,., all n-D information is visualized in 2D and can be restored from 2D visualization for each n-D case. This hybrid visual analytics method allowed classifying n-D data in a way that can be communicated to the user’s in the understandable and visual form
Recommended from our members
On the challenges and opportunities in visualization for machine learning and knowledge extraction: A research agenda
We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and businesses operate. Within data science, which has emerged as a practice to enable this data-intensive innovation by gathering together and advancing the knowledge from fields such as statistics, machine learning, knowledge extraction, data management, and visualization, visualization plays a unique and maybe the ultimate role as an approach to facilitate the human and computer cooperation, and to particularly enable the analysis of diverse and heterogeneous data using complex computational methods where algorithmic results are challenging to interpret and operationalize. Whilst algorithm development is surely at the center of the whole pipeline in disciplines such as Machine Learning and Knowledge Discovery, it is visualization which ultimately makes the results accessible to the end user. Visualization thus can be seen as a mapping from arbitrarily high-dimensional abstract spaces to the lower dimensions and plays a central and critical role in interacting with machine learning algorithms, and particularly in interactive machine learning (iML) with including the human-in-the-loop. The central goal of the CD-MAKE VIS workshop is to spark discussions at this intersection of visualization, machine learning and knowledge discovery and bring together experts from these disciplines. This paper discusses a perspective on the challenges and opportunities in this integration of these discipline and presents a number of directions and strategies for further research
Exploratory topic modeling with distributional semantics
As we continue to collect and store textual data in a multitude of domains,
we are regularly confronted with material whose largely unknown thematic
structure we want to uncover. With unsupervised, exploratory analysis, no prior
knowledge about the content is required and highly open-ended tasks can be
supported. In the past few years, probabilistic topic modeling has emerged as a
popular approach to this problem. Nevertheless, the representation of the
latent topics as aggregations of semi-coherent terms limits their
interpretability and level of detail.
This paper presents an alternative approach to topic modeling that maps
topics as a network for exploration, based on distributional semantics using
learned word vectors. From the granular level of terms and their semantic
similarity relations global topic structures emerge as clustered regions and
gradients of concepts. Moreover, the paper discusses the visual interactive
representation of the topic map, which plays an important role in supporting
its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent
Data Analysis (IDA 2015
Influence of vaccine-preventable diseases and HIV infection on demand for an infectious diseases service in Rio de Janeiro State, Brazil, over 22 years – Part II (1995-2016)
Patients’ data during daily clinical care are extremely important for improving the allocation of healthcare resources and for assessing healthcare demands. The prospective gathering of these data over decades allowed us to describe the trends of infectious diseases in a tertiary hospital. The results concerning the period between 1965 and 1994 described the exponential increase in the incidence of HIV infection and its important effects on our institutional mortality. The present study describes the demand for the same hospital between 1995 and 2016. There were 4,691 admissions and the main causes of admissions were, in descending order, HIV infection (1,312, 28.0%), noninfectious diseases (447, 9.5%), meningoencephalitis (432, 9.2%), soft tissue infections (427; 9.1%), tuberculosis (272, 5.8%), pneumonias (212, 4.5%) and leptospirosis (212, 4.5%). There were 864 readmissions; most due to HIV infections (65.2%). The institutional mortality fell from 16.9% in the first two years to 5.0% in the last two years of the study. The case-fatality rates among the HIV patients decreased from more than 40% to approximately 5% over the study period. In the last two decades, the hospital experienced a decrease in demand due to vaccine-preventable diseases. The demand for children has fallen and the demand for patients over the age of 50 has increased. These results reflect the improvement in public health standards over more than half a century and the positive effects of the National Immunization Program. They also illustrate the sharp decline in the HIV case-fatality rate after the introduction of combined antiretroviral therap
Visual Analytics for Network Security and Critical Infrastructures
A comprehensive analysis of cyber attacks is important for better understanding of their nature and their origin. Providing a sufficient insight into such a vast amount of diverse (and sometimes seemingly unrelated) data is a task that is suitable neither for humans nor for fully automated algorithms alone. Not only a combination of the two approaches but also a continuous reasoning process that is capable of generating a sufficient knowledge base is indispensable for a better understanding of the events. Our research is focused on designing new exploratory methods and interactive visualizations in the context of network security. The knowledge generation loop is important for its ability to help analysts to refine the nature of the processes that continuously occur and to offer them a better insight into the network security related events. In this paper, we formulate the research questions that relate to the proposed solution
The four faces of information visualization: A conceptual framework for a postgraduate program
The multidisciplinary nature of information visualization is today fairly consensual in both professional and academic communities: data analysis, information design, storytelling, among other subjects, are common drivers in this field. The systematic study of this cross-fertilization, patent in the way the concept's definition varies according to the perspective being adopted, represents an important and needed addition to the critical mass of a relatively recent area of knowledge. The proposal of a single unified definition of information visualisation being beyond the scope of this paper, it instead summons and discusses its multiple viewpoints to help designing a postgraduate program on the topic, aiming to simultaneously start an open debate as its implementation phase goes on and new questions are subsequently raised.info:eu-repo/semantics/acceptedVersio
Evaluation of two interaction techniques for visualization of dynamic graphs
Several techniques for visualization of dynamic graphs are based on different
spatial arrangements of a temporal sequence of node-link diagrams. Many studies
in the literature have investigated the importance of maintaining the user's
mental map across this temporal sequence, but usually each layout is considered
as a static graph drawing and the effect of user interaction is disregarded. We
conducted a task-based controlled experiment to assess the effectiveness of two
basic interaction techniques: the adjustment of the layout stability and the
highlighting of adjacent nodes and edges. We found that generally both
interaction techniques increase accuracy, sometimes at the cost of longer
completion times, and that the highlighting outclasses the stability adjustment
for many tasks except the most complex ones.Comment: Appears in the Proceedings of the 24th International Symposium on
Graph Drawing and Network Visualization (GD 2016
Goal-Based Selection of Visual Representations for Big Data Analytics
The H2020 TOREADOR Project adopts a model-driven architecture to streamline big data analytics and make it widely available to companies as a service. Our work in this context focuses on visualization, in particular on how to automate the translation of the visualization objectives declared by the user into a suitable visualization type. To this end we first define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and describing the data to be visualized; then we propose a skyline-based technique for automatically translating a visualization context into a set of suitable visualization types. Finally, we evaluate our approach on a real use case excerpted from the pilot applications of TOREADOR
- …